EN FR
EN FR


Section: New Results

Computational linguistics

Participants : Olivier Catoni, Thomas Mainguy.

In a forthcoming paper, Olivier Catoni and Thomas Mainguy study a new statistical model to learn the syntactic structure of natural languages from a training set made of written sentences. This model learns a new type of stochastic grammar and defines a statistical model on sentences. Global constraints are enforced, that set the approach apart from the family of Markov models. On the other hand, the grammar model generates outputs through a split and merge stochastic process that is more elaborate than the production rules defining a context free grammar. Experiments made on small corpora are very encouraging. Working on large corpora will require to speed up the algorithms used to implement the model as well as some code optimization.